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Abstract. Projective clustering is a problem with both theoretical and practical importance and has 
received a great deal of attentions in recent years. Given a set of points P in R'* space, projective clustering 
is to find a set F of fc lower dimensional j-flats so that the average distance (or squared distance) from 
points in P to their closest flats is minimized. Existing approaches for this problem are mainly based on 
adaptive/ volume sampling or core-sets techniques which suffer from several limitations. In this paper, 
we present the first uniform random sampling based approach for this challenging problem and achieve 
linear time solutions for three cases, general projective clustering, regular projective clustering, and L-r 
sense projective clustering. For the general projective clustering problem, we show that for any given 
small numbers < 7, e < 1, our approach first removes 7|P| points as outliers and then determines k 
j-flats to cluster the remaining points into k clusters with an objective value no more than (l + e) times of 
the optimal for all points. For regular projective clustering, we demonstrate that when the input points 
satisfy some reasonable assumption on its input, our approach for the general case can be extended to 
yield a PTAS for all points. For Lt sense projective clustering, we show that our techniques for both the 
general and regular cases can be naturally extended to the L-r sense projective clustering problem for 
any 1 < t < 00. Our results are based on several novel techniques, such as slab partition, Zi-rotation, 
symmetric sampling, and recursive projection, and can be easily implemented for applications. 



1 Introduction 

Projective clustering for a set P of n points in M'^ space is to find a set ¥ of k fower dimensional j -fiats 
so that the average distance (by certain distance measure) from points in P to their closest fiats is 
minimized. Depending on the choices of j and k, the problem has quite a few different variants. For 
instance, when k = 1, the problem is to find a j-flat to fit a set of points and is often called shape 
fitting problem. On the contrary, when j = 1, the problem is to find k lines to cluster a point set, and 
thus is called /c-line clustering. In this paper, we mainly consider the L2 sense projective clustering, 
i.e., minimizing the average squared distances to the resulting flats. We also consider extensions to 
regular projective clustering and sense projective clustering for any integer 1 < r < 00, where the 
regular projective clustering is for points whose projection on its optimal fitting fiat have bounded 
coefficient of variation along any direction. 

Previous results: Projective clustering is related to many theoretical problems such as shape 
fitting, matrix approximation, etc., as well as numerous applications in applied domains. Due to 
its importance in both theory and applications, in recent years, a great deal of effort has devoted 
to solving this challenging problem and a number of promising techniques have been developed 
[lpp|7p^H l4p^P^H2Jl |2Tp^ From methodology point of view, Agarwal et al. [l] first introduced 
a structure called kernel set for capturing the extent of a point set and used it to derive a number of 



algorithms related to the projective clustering problem. Har-Peled et al. 19 ,20 presented algorithms 



for shape fitting problem based on kernel set and core-sets. The core-set concept has also been 



extended to more general projective clustering problems 13 21 28 , and has proved to be effective 



for many other problems i2pf [Top7p8] . A nother main approach for projective clustering is dimension 
reduction through adaptive sampling [12 27 . Prom time efficiency point of view, most of the existing 
algorithms for projective clustering problems have super-linear dependency on the size n of the 
point set. Several linear or near linear time (on n) algorithms were also previously presented. In js], 
Agarwal et al. presented a near linear time algorithm for /c-line clustering with L^o sense objective. 
In 13 , Edwards and Varadarajan introduced a near linear time algorithm for integer points and with 
Loo sense objective. In [28], Varadarajan and Xiao designed a near linear time algorithm for fe-line 
clustering and general projective clustering on integer points with Li sense objective. Furthermore, 



14,16 present a linear time bicriteria approximation algorithm with Li, L2 and Lqo sense. 



Relations with subspace approximation: A problem closely related to j-fiat fitting is the low 
rank matrix approximation problem whose objective is to find a lower dimensional subspace, rather 
than a fiat, to approximate the original matrix (which is basically a set of column points). For this 



problem. Frieze et al. introduced an elegant method based on random sampling 15 . Their method 



additively approximates the original matrix, but unfortunately is not exact PTAS. To achieve a 



PTAS, Deshpande et al. presented a volume sampling based approach to generate j-subspaces 12 
Their algorithm works well for the single j'-fiat/subspace fitting problem, and can also be extended 
to projective clustering problem (but with relatively high time complexity). Shyamalkumar et al. 



present an algorithm for subspace approximation with any L^- sense objective, for r > 1 27 . 
2 Main Results and Techniques 

Definition 1 (Lr Sense (A;, j)-Projective Clustering and j-Flat Fitting). Given a point set P 
in space, and three integers k>l,l<j<d and 1 < r < 00, an sense {k, j) -projective cluster- 
ing is to find k j -dimensional flats ¥ = {J-'i, ■ ■ ■ , 7"^} in M*^ space such that jp-^ X^peP ™iiii<i<fc -^ilT 
is minimized. When k = 1, it is a j-flat fitting problem. 

In this paper, we assume both k and j are constant. is the closest distance from p to 

2.1 Main Results 

In this paper, we mainly focus on the case of r = 2 on arbitrary points (i.e., general projective 
clustering), and then extend the ideas to two other cases, regular projective clustering and Lr sense 
projective clustering for any integer 1 < r < 00. We present a uniform approach, purely based on 
random sampling, to achieve linear time solutions for all three cases. 
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— General (fc, j)-projective clustering: For arbitrary point set P and small constant numbers 
< 7,e < 1, our approach leaves out a small portion (i.e., 7|-P|) of the input points as outliers, 

and finds, in 0{2F°^^^^'^^nd) time, k j-fiats to cluster the remaining points so that their objective 
value is no more than (1 + e) times of the optimal value on the whole set P. Our result relies on 
several novel techniques, such as symmetric sampling, slab partition, Z\-rotation, and recursive 
projection. 

— Regular projective clustering: When the input point set P has regular distribution on its 
clusters, our approach yields a PTAS solution for the whole point set P in the same time bound. 
The regularity of P is measured based on the Coefficient of Variation (CV) on the projection of 
its points along any direction on their optimal fitting fiat. P is regular if CV has a bounded value. 
Since many commonly encountered distributions, which are often used to model various data or 
noises in experiments, are regular (such as Gaussian distribution, Erlang distribution, etc), our 
result, thus, has a wide range of potential applications. 

— Lr sense projective clustering: Our approach can also be extended to Lr sense projective 
clustering for any 1 < r < oo and with the same time bound. We show that each technique 
used for the general and regular projective clustering (i.e., the case of r = 2) can be extended to 
achieve similar results. 

Comparsons with previous results: As mentioned earlier, existing works on projective clustering 
can be classified into two categories: (a) adaptive sampling (or volume sampling) based approaches 



12,27 and (b) Core-sets based approaches (13,28 . Often, (a) can efficiently solve the single fiat 



fitting problem (i.e., subspace approximation), but its extension to projective clustering requires a 
running time (i.e., 0{d{n/ey^^^'^)) much higher than the desired (near) linear time, (b) can solve 
projective clustering in near linear time, but the input must be integer points and within a polynomial 
range (i.e., (mn)^^) in any coordinate. The main advantages of our approach are: (1) its linear time 
complexity, (2) do not need to have any assumption on its input (if a small fraction of outliers is 
allowed), (3) achieve linear time PTAS for regular points, (4) simple and can be easily implemented 
for applications. 

2.2 Key Techniques 

Our approach is based on a key result in [22] , which estimates the mean point of large point set by 
a small random sample whose size is independent of the size and dimensionality of the original set. 
This result is widely used in many areas, especially in fc- means clustering p3] - |25| . Since projective 
clustering is a generalization of A;-means clustering, where the mean point is simply a 0-dimensional 
flat, it is desirable to generalize this uniform random sampling technique to the more general fiat 
fitting and projective clustering problems (without relying on adaptive or volume sampling or core- 
sets techniques). 

To address this issue, we show that after taking a random sample S, it is impossible to generate 
a proper fltting flat if we simply compute the mean of S as in |22|. Our key idea is to use Symmetric 
Sampling technique to consider not only S, but also —S, which is the symmetric point set of S with 
respect to the mean point o of the input set P. Intuitively, if we enumerate the mean point of every 
subset of 5 U —S, there must exist one such point p that not only locates close to the optimal fitting 
flat, but also is far away from o. This means that p can define one dimension of the fitting fiat, 
and thus we can reduce the j-flat fltting problem to a (j — l)-flat fitting problem by projecting all 
points to some (d — 1) dimensional subspace. If recursively use the strategy j times, which is called 
Recursive Projection, we can get one proper flat. With this fiat fitting technique, we can naturally 
extend it to projective clustering. 

3 Hyperbox Lemma and Slab Partition 

In this section, we present two standalone results, Hyperbox Lemma and Slab Partition, which are 
used for proving our key theorem (i.e.. Theorem [T]) in Section 4.2 
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Definition 2 (Slab and Amplification). Let o and s be two points in M"^, and 17 and —Q be the 

two hyperplanes perpendicular to vector and passing through s and —s respectively, where —s is 
s's symmetric point about a. The region bounded by i? and —Q is called the Slab determined by 



(denoted as TZ). Further, let s' be a point collinear with o and s with 



||ol' 



determined by os' is called an amplification ofTZ by a factor A (see Figure\w. 
3.1 Hyperbox Lemma 



A. Then the Slab TZ' 
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Fig. 1. An example illustrating Definition [2] Fig. 2. An example illustrating Lemma [T] 

Lemma 1 (Hyperbox Lemma). Let % be a hyperbox in W , and o be its center. Let {/i, • • • , /j} 
be j facets (i.e., {j — 1)- dimensional faces) of % with different normal directions (i.e., no pair are 
parallel to each other), and p = {pi, ■ ■ ■ , pj} be j points with each Pi, 1 < i < j, incident to fi. Then 
there exists one point pi^ G p such that the slab determined by o^i^ contains Ti after amplifying by a 
factor no more than VJ- 

Proof. Let ai, • • • ,aj be the j side lengths of Ti. For each 1 < I < j, denote the slab determined 
by 0^)1 as TZi (with two bounding hyperplanes f2i and —fii), and its minimal amplification, which is 
barely enough to contain %, as TZ\ (i.e., its two bounding hyperplanes Q[ and —Q[ support %). Let 
ti be a point in Q[r\'H (i.e., a point on the (possibly 0-dimensional) touching face of Q[ and H), and 
p'l be the intersection point of Q[ and the supporting line of o and pi (see Figure [2]). Then we have 

< Ho — < A/y^:?.._i a?.,, and l|o — > ai . Thus, we know that the amplification factor 



Pi\ 



< 



\o-pi\\ — ai 

Thus the lemma is true. 



— . Let = maxjai, • • • , aj}, then we have 



01 



\o-pi 



< 



< 



0' 



□ 



3.2 Slab Partition 

Definition 3 (Slab Partition). Let o be the origin of W , and o^i,--- , o^j be the j orthogonal 
vectors defining the coordinate system of W . The following partition is called Slab Partition on W : 
III = TIi-i n TZi for I < I < j , where TTq = W , TZi is the Slab determined by o^t'i, and u'^ is some 
point on the ray of otii (see Figurel^. 




Fig. 3. An example of 2D slab partition. Fig. 4. The right cuboid is enlarged from the left one respect to pi. 

Lemma 2. Let {Hq, 7Ti, • • • , LLj} be a slab partition in W , and {TZi, ■ ■ ■ jTZj} be the corresponding 
partitioning slabs. Let {pi, ■ ■ ■ ,pj} be the j points such that pi G ^j-i H dTZi for 1 < I < j , where 
dTZi is the bounding hyperplane ofTZi. Then there exists a point pi^, such that the slab determined 
by 0^1^^ contains Tlj after amplified by a factor of y/j . 
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Proof. It is easy to see that IJj = oj^j^TZi, which is a hyperbox in W . Thus, it is natural to use Lemma 
[T]to prove the lemma. For this purpose, we let 5;, 1 < / < j, be one of the bounding hyperplanes 
of TZi with pi incident to it. For any w < I, from slab partition we know that the whole 77/ locates 
inside 71^. Thus, pi also locates inside TZw Let /; = Uj n Si. Thus the j facets {/i, • • • , fj} of Uj 
point (i.e., their normal directions) to different directions. Note that since // is only a subregion of 
Si, pi is possibly outside of //. Thus we consider the following two cases, (a) every pi locates inside 
/; for 1 < / < j and (b) there exists some pi locates outside of fi. 

For case (a), the lemma follows from Lemma [l] after replacing T-L by Uj. For case (b), our idea is 
to reduce it to case (a) through the following procedure. 

1. Initialize a set of points {pi, • • • , pj} with pi = pi for I <l < j. 

2. Set 1 = 1. Do the following steps until I > j. 

(a) Set w = I + 1. Do the following steps until w > j. 

i. If pi is outside of 11^, first amplify TZ^ until it touches pi (see Fig. [4]), and then set Pw = Pi- 

ii. w = w + 1. 

(b) 1 = 1 + 1. 

Claim. After the above procedure, {pi, • • • , pj} becomes a case (a) set with respect to the amplified 

7T,- = ntl7^^ 

To show this claim, we observe that there are two loops in the procedure. In the first loop, each Z-th 
round guarantees that pi locates inside (or on the boundary) of the Z-th facet of the enlarged 77^. 
Note that pi is always inside of IZ^ for w < I. Thus the second loop only starts from w = l + l. After 
amplifying TZ^ , the original pw will no longer be on dTZw ■ Thus replacing by pi will keep it on the 
boundary of dTZw Thus, after finishing the two loops, {pi, ■ ■ ■ , pj} will become a case (a) set with 
respect to the new Uj. 

Note that in case (b), the resulting {pi,--- , pj} is actually a subset of the original {pi,--- ,Pj}. 
Thus, we do not really need to perform the procedure to complete the reduction. We only need to 
find the desired pi^ whose existence is ensured by Lemma [T] Thus, the lemma holds. □ 

4 zl-Rotation and Symmetric Sampling 

This section introduces several key techniques used in our algorithms. Let be a j-dimensional fiat 
and P be a set of M'' points. We denote the average squared distance from 7-* to as 6pjr = X^pgp 

Hp, where is the closest distance from p to T. 

4.1 Flat Rotation and /X-Rotation 

In this section, we discuss flat rotation, and how it affects single fiat fitting. 




Fig. 5. An example illustrating Definition [4] Fig. 6. An example illustrating Definition [5] 



Definition 4 (Flat Rotation). Let T he a j-dimensional flat in W^, he a point on T , and u 
he any given point in M"'. Let Proj{u) denote the orthogonal projection of u on T , and F denote 
the j — 1-dimensional face of J- which is perpendicular to the vector Proj(u) — a. Then the flat T' 
spanned hy F and the vector u — o is a rotation of T induced hy the vector u — o, and the rotation 
angle 6 is the angle hetween u — o and Proj{u) — o (see Fig. [qj. 
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In the above definition, wlien there is no ambiguity about o, we also call the rotation is induced 
by u. 

Definition 5 (Z\- Rot at ion). Let P be a point set and T be a j -dimensional flat in W^. Let o be a 
point on T , u be any given point in M°', and h"^ = jp-^ Z^pep I < p — o, ||p^°j-("|l"|| > where Proj{u) 
is the orthogonal projection of u on J-, and < a,b > denotes the inner product of a and b. Let T' be 
a rotation of T induced by the vector u — a with angle 9. Then it is a A-rotation with respect to P 
if 9 < arctan ^ . 

In the above definition, h^ is the average squared projection length of each p — o along the direction of 
Proj{u) — 0. Figure |6] shows an example of Z\-rotation. The following lemma shows how the average 
squared distance Jp jr from P to T (i.e., Z^pep changes after a Z\-rotation. 

Lemma 3. Let P be a point set in , F he a j -dimensional flat, and u be a point in . If J-' 
is a A-rotation (with respect to P) of J- induced by the vector u — a for some point o ^ T, then 

5p jri < 5p^jr + A. 

Proof. We use the same notations as in Definition [5] For any p £ P, we let Up denote | < p — 
o, ||p^oj(^)-o|| > I) ^i^d Proj{p) denote its orthogonal projection on T. Then by triangle inequality, 
we have 

< Hp - Proj{p)\\ + \\Proj{p),T'\\ = + \\Proj{p),T'\\. (1) 

Meanwhile, since the rotation angle from to J^' is9 < arctan ^, we have \ \Proj{p),J^'\ \ = Up sin 9 < 
Up tan 9 < ^Up. Plugging this into inequality (nj), we get 



IPj -^'1 1^ ^ {\\P;^\ \ + Up sin 6)^ = I |p, J^l 1^ + 2up sin 6*1 \p, T\ \ + {up sin 6)^ 

^ II T-||2 , o'^ II T-ll 1 n2 II -7-||2 , ^ ^^P,T II T-ii , 1^ n2 

< ||P,>^|| + 2 — lipllp.y^ll + ( — Mp) = \\p,T\\ +- 2——Up\\p,T\\\ { — Up) 

a h op,T n n 

= (1 + ^)||p,^||^ + + {^f){^^up)\ (2) 

where the first inequality follows from sin0 < ^, and the second inequality follows from the fact 
that lab < a'^ -\-b'^ for any pair of real numbers a and b. Summing both sides of ^ over p, we have 

E ib.^'ii' ^ (1 + A) E ib.^ii' + ( A + (A)') E('¥"-)' 

p£P p£P p€P 

= (1 + A) E iiP'^ii' + ( A + ( A)')%^ ek)'- (3) 

peP peP 
Since h"^ = jj^ Y^pepi^p^^ ^P,T = ]F\ EpeP H^'-^H^ ^P,T' = ]h\ EpeP I IP' -^'1 (3) becomes 



pGP ' ' ' peP ' .-'II 

A 2 1 

Sp,t' I pi 



A)Aeiip'-^h' = (^^>^+^)'- w 

peP 

Thus, the lemma is true. □ 
4.2 Symmetric Sampling 
Algorithm Symmetric-Sampling 

Input: A set S = {si, S2, . . . ,Sm} oiM.'^ points and a single point o in M'^. 
Output: A new point set 5. 
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1. Initialize 5 = 0. 

2. Construct a new point set —S = {2o — si, - ■ ■ ,2o — Sm}, which is the set of symmetric points of 
S (i.e., symmetric about o). 

3. For each subset of 5 U —S, add its mean point into S. 

Below is the main theorem about Algorithm Symmetric-Sampling. 

Theorem 1. Let P be a set o/ M'^ points, and S be its random sample of size r = ^In^, where 
< 7 < 1 and e > are two small numbers. Let IF be a j -dimensional flat, a be a given point on 
T , and S be the set of points returned by Algorithm Symmetric- Sampling on S and a. Then with 
probability (1 — e)^, S contains one point s such that the hyperplane T' rotated from T and induced 
by satisfies the following inequality, \P'\6p, -p, < (1 + 5^/]r)'^\P\6p -p, where P' is a subset of P 
with size\P'\>{l-j)\P\. 

Before proving Theorem [T| we first introduce the following two lemmas. 

Lemma 4. Let S be a set of n elements, and S' be a subset of S with size \S'\ = an. Lf randomly 
select in(/^_Q,) In ~ = 0(Mn|) elements from S, with probability at least 1 — rj, the sample contains 
at least t elements from S' (see Appendix for proof). 



The following lemma has been proved in |22| . 

Lemma 5 ( |22|). Let S be a set of n points in M'^ space, and T be a subset with cardinality m 
randomly selected from S. Let x{S) and x[T) be the mean points of S and T respectively. With 
probability 1 - r/, \\x{S) -x{T)\\'^ < ^Var^{S), where Var'^iS) = ^-ss\\^-^is)\\\ 

Proof (of Theorem First, imagine that if we can show the existence of (1) a subset P' C P 
with size IP'I > (1 - ?)|P| and (2) a j-flat T' which is a Z\-rotation (with A = f^'' dpjr) of T 

with respect to P' , then by Lemmajsj we have dpi^jn < 6pi^jr + A. Meanwhile, since X^^gp/ < 
^pgpllp, and \P'\ > (1 — we have 6p,jr < i-^/j ^p j^- Combining the two inequalities, we 

have (5p/,j-/ < {^Jj^ + -^==)5p,F =^ I^'I'^p'.j-' < (1 + ^Vjrf\P\^p,T- This means that we only 

need to focus on proving the existence of such P' and J-' . 

Without loss of generality, we assume that the given point o is the origin. To prove the theorem, 
we first assume that all points of P locate on J^, which would be the case if project all points of P 
onto J- or equivalently each point in this case can be viewed as the projection of some point of P in 

Now we consider the case that P is in J^. First, we construct a slab partition {LJq, LIi, ■ ■ ■ ,LIj} 
for the j-dimensional subspace spanned by T with {TZi, ■ ■ ■ ,7^^} being the corresponding slabs such 
that \P n LIi\ = (1 — j^)\P f] LIi-i\. Clearly, this can be easily obtained by iteratively (starting from 
/ = 1) selecting the slab Ri as the one which exclude the -^\P Ci IIi-il points whose l-th coordinate 
have the largest absolute value. 

From the slab partition, it is easy to see that for any < Z < j, ^^'^p^''^ — ~ jiY ^ 1 ~ ^) 

and ^^^*"^'p|^^^'^^ ^ p(l — j)- Thus, if we set t = ^ and V — ^^'^ use Lemma |4j to take a random 

tin - - In ■' 

sample from P of size = l-^'/j ; with probability (1 — > 1 — e, the sample contains at 

least ^ points from each P n 77;. Also, if we set t = 1, and use Lemma [i] to take a random sample 

t In — \yi 

from P of size [^/p){iL^ij) = WfPW-iTT)^ with probability (1 — j)-' > 1 — e, the sample contains 
at least 1 point from each P n {IIi-i \LIi). This means that if we take a random sample S of size 

^ In i > max{ ^„^^ ; {i / / j) ^ '' ^^^"^ with probability (1 — e)^, we have ^ In > \Ai\ > ^ and 
\Ai\ni\>l for any / (by the fact that |5 n P n (iTi_i \ 71;)! > 1), where Ai = Sr\{Pr\ 
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Let l{p) be the l-th coordinate of a point p, Bi = {p \ p £ —Ai U Ai, l{p) > 0}, and pi be the mean 
point of Bi for 1 < / < j. Note that pi is contained in the output of Algorithm Symmetric-Sampling. 

We define another sequence of slabs {TZ'i, • • • where each TZ'^ is the slab axis parallel to TZi 

and with pi incident to one of its bounding hyperplanes. These slabs induce another slab partition 
on with n'l = n[_^ n 1Z[. By Lemma [2| we know that there exists one point /O/q such that the slab 
determined by o^/g contains U'- after amplified by a factor of y/j. 

By the fact that \Ai\ni\ > 1 and the symmetric property of —AiUAi^ we know that \Bi\ni\ > 1. 

a' 1 

Denote the width of 77; and n[ in the /-th dimension as a; and a[ respectively. Then, ^ > j-g^ > ^ .^^ j 

(note that \Bi\ = \Ai\). Let r = — Then, il; and 77^ differ in width (in the l-th dimension) by 
a factor no more than r. Thus, by the slab partition procedure, the difference of the width between 
i7j and TTj is no more than a factor of r in any direction on J^. As a result, the slab determined by 
of)iQ contains Uj after amplified by a factor of Vjr. 

Now, we come back to the case that P locates in M"' rather than J^. First we let P' = P H IJ. 



Then, we have |P'| > (1 — ■?)|7'|. Note that we can always use the projection of P whenever it is 
not in T. Thus we can still select P' by using slab partition on T. This ensures the existence of P' . 
Next, we prove the existence of J-' . 

In this case, p/g may not be in F. We will prove the rotation induced by o^/p is a Z\-rotation of 

T with respect to P n Uj. We let Proj{pi^) denote the projection of pi^ on J^, it = \\^^r°oj{p°\-o\ \ ' 

and h'^ = jp^J2peP' I < Pi > By the way how p;,, is generated, we know that \\Proj{pig) — 

o\\ > ;^maxpgp/{| < p,lt > \} > -^j^h- Thus, in order to prove it is a — ^^=(5p^jr-rotation, we 

just need to prove ||/3;„ — Proj(/3;„)|| < ^ ^pt- Recall that p;„ is the mean point of Bi , and 

|i?;„| = \Aiq\ > J. Then we have the following claim (see Appendix for proof). 

Claim (1). With probability (1 — e)^, Wpi^ — Proj{pi^^)\\ < 55p^jr. 

Claim (1) implies that \ \pig — Proj{pig)\\ < 55pjr < , ^ ^pt^ which means that the hyperplane 



:(5p jr-rotated from T is the desired J^' . 



As for success probability, since the success probability of containing pi^ is (1 — e)^ as shown in 
previous analysis, and the success probability for Claim (1) is also (1 — e)^, the success probability 
for Theorem [l] is thus (1 — e)'*. □ 

5 Approximation Algorithm for Projective Clustering 

This section presents a (1 + e)-approximation algorithm for the projective clustering problem. For 
ease of understanding, we first give an outline of the algorithm. 

5.1 Algorithm Outline 

As mentioned in previous section, the objective of our approximation algorithm for the projective 
clustering problem is to determine k j-dimensional flats such that (1 — 7)|P| of the input points 
in P can be fit into the obtained k flats, and the total objective value is no more than (1 + e)6'^p^, 
where is the objective value of an optimal solution for all points in P. Let C = {Ci, • • • , C^} be 
the k clusters in an optimal solution for P. Our approach only consider those clusters (called large 
clusters) in C with size at least ^|-P|, since the union of the remaining clusters has a total size no 
more than ^\P\. Also, for each large cluster Ci, our approach generates a flat to fit (1 — ^)|Cj| of its 
points. Thus, in total we fit at least (1 — 2)^1-^1 ^ (1 ~ of points to the resulting k flats. 

Consider a large cluster Ci. Let J-'i be its optimal fitting flat. It is easy to see that J-i passes 
through the mean point Oj of Ci. Let 6f = XlpeCi llP'-^ilP be the optimal objective value of 
Ci. To emulate the behavior of Ci, we first assume that we know the exact position of Oj. If we 
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run Algorithm Symmetric-Sampling on Oi and a random sample of Cj (note that since Ci is a large 
cluster, by Lemma |4| we can obtain enough points from Ci by randomly sampling P directly), 
by Theorem [l| we can get a point s, such that Ojt induces a zA-rotation for J^i with respect to a 
subset of Ci with at least (1 — ^)|Cj| points, where A = --J^=5i. If we recursively run Algorithm 

Symmetric-Sampling j times, we obtain a sequence of Z\-rotations and j vectors which form a j- 
dimensional flat T'- such that Spec' ll^'i-^ilP ^ (1 + ^\/~Jfy^'i ^i induces a (1 + by/jry- 

approximation for Cj), where C[ is a subset of Ci with at least (1 — YjY\^i\ ^ 1 ~ 2 poiiits. Since 
Algorithm Symmetric-Sampling enumerates all subsets of —S U 5, the above recursive procedure 
(called Algorithm Recursive-Projection] see Section 5.2 ) forms a hierarchical tree. 

From the proof of Theorem [2| we will know that the approximation ratio (i.e., (1 + 5\/jr)'^^) is 

solely determined by the Z\-rotation. In other words, if we could reduce the value of A from 6i 

Vi-7/i 

to ^6i, the approximation ratio would be reduced to (1 + ^)^-' ~ 1 + e. To achieve this, our idea is 
to find a point closer to Ti to induce the desired Z\-rotation. Our idea is to draw a ball B centered 
at every candidate point which induces the Z\-rotation, and build a grid inside B. The grid ensures 
the existence of one grid point close enough to By using the linear time dimension reduction 
technique in we can reduce the dimensionality of the projective clustering problem from d to 
^M^O(i)^ This enables us to reduce the complexity of the grid. Thus, the approximation ratio can be 
reduced to 1 + e in linear time. 

Now, the only remaining issue is how to find the exact position of Oi. From Lemmas [5] and |4j 
we can find an approximate mean point Oj for Oj. Thus, by translating J-i to pass though Oj, we_can 

we 



Ci\ T.pec\ \ \P ^'"^ 
can obtain a good approximation J^i for J^j. 



show that dist{J^i,J^i} < 0(e)|^ SpeCi ll^'j-^ill ■ Combining this with the following Lemma 
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Lemma 6. Let P be a point set, T he a j -dimensional flat, and T be a translation of J- in W^. 
Then, EpeP \\P,H' < jklEpepWP^J'W' + dist{T,T}. 

5.2 Recursive Projection Algorithm 
Algorithm Recursive-Projection 

Input: A point set P and a single point oGM'^, 0<7< 1, and 1 < j < d. 

Output: A tree T of height j with each node v associated with a point ty and a flat fv in M^. 

1. Initialize T as a tree with a single root node associated with no point. The fiat associated by the 
root is M'^. 

2. Starting from the root, for each node v, grow it in the following way 

(a) If the height of v is j, it is a leaf node. 

(b) Otherwise, 

i. Project P onto f^, and denote the projection as P. 

ii. Take a random sample Q from P with size r = ^ In and run Algorithm Symmetric- 
Sampling on Q and o to obtain 2^^ points Q as the output. 

iii. Create 2^^' children for v, with each child associated with one point in Q. 

iv. Let o be the mean point of Q. For each child u, let tu be its associated point; associate 
u the flat which is the subspace of perpendicular to otu and with one less dimension 
than fv 

Running Time: It is easy to see that there are 0(2'^'^^) nodes in the output tree T. Each node costs 
0{2^'^nd) time. Thus, the total running time is 0{2'^^^^'^^^nd). 

Theorem 2. With probability (1 — e)^, the output T from Algorithm Recursive-Projection contains 
one root-to-leaf path such that the j points associated with the path determine a flat T satisfying 
inequality XlpeP' ^ + ^VJ'^)^'' ji^ I]pep IIP)-^opt|P, where P' is a subset of P with at 

least (1 — 7)|P| points and F apt is the optimal fitting flat for P among all j- dimensional flats passing 
through o. 
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Proof. Without loss of generality, we assume that o is the origin of M"^. Since all the related flats in 
the algorithm pass through o, we can view each flat as a subspace in M"^. 

From the algorithm, we know that for any node v at level / of T, 1 < / < j , there is a corresponding 
implicit point set P, which is the projection of P on There is also an implicit flat H J- opt in 
f^. By Theorem [T| we know that there is one child of v, denoted as v' , such that the rotation for 
fv<^J^opt induced by ot^i is a Z\-rotation with respect to a subset P' of P with size > (1 — j 

where A = ^^^\ 5p . . Thus, if we always select such children (satisfying the above condition) 

from root to leaf, we have a path with nodes {fo,fi, • • • ,Vj}, where vq is the root, and vi+i is the 
child of vi. Correspondingly, a sequence of implicit point sets {Po,Pi, - ■ ■ , Pj} and a sequence of flats 
{-Fo, J^i, • • • ,J^j} can also be obtained, which have the following properties. 

1. Initially, = Topt, Po = P- 

2. For any 1 < I < j, J^i is the rotation of J^i-iCi fy^_-^ induced by ot^^, Pi is a subset of the projection 
of Pi^i on fvi_i with size at least (1 — ?)|Pi_i|, and is a Z\-rotation with respect to Pi (see 
Figure W\ ) . 

Note that since both and J^i-i D fvi^i locate on fvi_i, they are all perpendicular to otyi_^^. The 
following claim reveals the dimensionality of each (see Appendix for proof). 

A-rotation/ /"'"l ^ 




Fig. 7. An example illustrating how J-i-i evolves into J-i. 
Claim (2). For 1 < / < j, -F; is a (j — / + l)-dimensional subspace. 
By Lemma |3j we can easily have the following claim. 

Claim (3). For any 1 < / < j, EpeP, \\P,H? < + ^^r)^ EpeP, \\P,^i-i ■ 

We construct as follows two other sequences, {Pi, • • • ,Pj} and {Ti, ■ ■ ■ for point sets and flats 

respectively. 

L Initially, = Fi, ^1 = A- _ ^ ^ 

2. For any 2 < I < j, J^i = span{JFi^ ot^^, • • • , oi„,_j}, and P/ is the corresponding point set of P/ 

mapped back from to W^. 
From the above construction for Ti, we have the following claim. 

Claim (4)- For any point p G P;, let p denote the corresponding point in M'^. Then = 
From Claim (2), we know that each is a j-dimensional subspace. From the algorithm, we know that 
span{Fi-i n fy^_^,^y^_^} = Fi-i (see Fig. [t]), which implies span{Fi-i n o!^^, • • • , ott,,_J = 
Then by Claim (3) and (4), we have the following inequality 

E IIP'-^'II' < (l + S^jr)^ Y. Ib'-^'-ill'- (5) 

By the definition of Pi, we have Pj C Pj-i C • • • C Pi C Pq = P. Thus, 

Elb'-^'-ill'< E Ib.-^'-ill'- (6) 

pePi pePi-i 

Combining ^ and ([6]), we have 

E ib'-^'ii'<(i + 5/?'-)' E iip.-^'-iii'- (7) 

pePi psPi-i 
Recursively using the above inequality, we have YlpePj WP^-^jW^ < (1 + 5-v/jr)^-'^pgp | |p, Jopt| p. 
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From the definition of Pj, we know tliat \Pj\ > (1 — -^)"'|-P| > (1 — 7)|-P|- Furtliermore, by Claim 

— > 

(2), we know that J-",- is a 1-dimension flat (i.e., the single line spanned by ot^.), hence J-j = 
span{oty^, ■ ■ ■ ,ot^^}. Thus, if setting P' = Pj, and T = Tj, we have jpTy X^pgp/ < (1 + 

5Vjrf'lp-\Epep\\P^^opt\\^. 

Success probability: Each time we use Theorem jlj the success probability is (1 — j)^ (we replace 
e by j to increase the sample size). Thus, the total success probability is (1 — j)^-' > (1 — e)^. □ 

5.3 Algorithm for Projective Clustering 
Algorithm Projective-Clustering 

Input: A set of points P in M.'^, positive integers k, j, and two positive numbers < e,7 < 1. 
Output: An approximate solution for (/c, j)-projective clustering 

1. Use the dimension reduction technique in to reduce the dimensionality from dio d! = (^)'^(^). 

2. Set the sample size r = ^ ln(2A;t), where t = ^ In 

3. Running Algorithm Recursive-Project k time with sample size r. Denote the k output trees as 
{Ti,--- ,Tk]. 

4. Enumerate the combinations of all k flats from the k trees: Select one flat yielded by a root-to-leaf 
path from each 71 for !</</;;, and compute the objective value of these k flats. Let C be the 
smallest objective value among all the combinations. 

5. Re-run Algorithm Recursive-Projection k time with the following modification: In Step 2(6), for 
each point p of the 2^*" points returned by Algorithm Symmetric-Sampling, build a ball B centered 
at p and with radius rg, and construct a grid inside B. For each grid point, create a node, associate 
it with the grid point, and make it as a sibling of the node containing p. 

6. Enumerate the combinations of all k flats from the k output trees of the above step. Find the k 
flats with the smallest objective value for P and output them as the solution. 

In the above algorithm, the radius and the density of the grid are chosen in a way so that 
there exists a grid point which induces a ^-rotation for the clustering points. Thus, we can further 
reduce the approximation ratio to (1 + e), and have the following theorem. Detailed analysis on the 
algorithm and the theorem is left in Section 10 of the Appendix. 

Theorem 3. Let P he a set of points in a {k, j) -projective clustering instance. Let Opt be 

the optimal objective value on P. With constant probability and in 0{2^°^^^''i^nd) time, Algorithm 
Projective-Clustering outputs an approximate solution {Ti, ■ ■ ■ ,J-k} such that each Ti is a j-flat, and 
\W\ X^peP' ™iiii</<fc ^ (1 + ^)Opt, where P' is a subset of P with at least (1 — 7)|P| points. 

5.4 Extensions. 

We present two main extensions. See Appendix for details. 

Linear time PTAS for regular projective clustering: For points with bounded Coefficient 
of Variation (CV) (we call such problem as regular projective clustering), we show that our approach 
leads to a linear time PTAS solution. The main idea is that since the CV is bounded, the point 
from symmetric sampling algorithm is far enough to o. This implies that the each Z\-rotation from 
Algorithm Recursive-Projection is for the whole set P rather than a subset P' . Thus we can fit all 
points of P into the resulting k flats within the same approximate ratio. 

Lr sense Projective clustering: We show that our approach can be extended to L^ sense 
projective clustering for any integer 1 < r < oo and achieve similar results for both general and 
regular projective clustering. The key idea is to define the Lj- sense Z\-rotation, and prove a result 
similar to Lemma |3j In other words, the symmetric sampling technique can also yield a L^ sense 
/^-rotation with 
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6 Proof of Lemma |4] 



Proof. If we randomly select z elements from S, then it is easy to know that with probability 
1 — (1 — a)^, there is at least one element from the sample belonging to S' . If we want the probability 

In-^^ In^ Ini 

1 - (1 - ay equal to 1 - -q/t, z has to be j^^-J^ = in{i+l^) ^ in(i+a) • 

In* 

Thus if we perform t rounds of random sampling with each round selecting ^^^^^^^^ elements, we 
get at least t elements from S' with probability at least (1 — rj/tY >! — ??. □ 

7 Proof for Claim 1 in Theorem [T] 

Proof (of Claim (I))- We first reduce the space from M'^ to J-"-*", which is a {d — j)-dimensional 
subspace. For simplicity, we use the same notations for points in J^-^ as in M°'. It is easy to know that 
during the space reduction from U.'^ to J^-*-, Proj^pi^) is projected to the origin, and \ \pig —Proj{pig)\\ 
is equal to \ \pig — o\\ in the subspace. We let i?^^^ = Bi^ n —5, and Bf^^ = Bi^^ n S. Correspondingly, 
we let —Bj^ denote the symmetric point set of B^^^ with respect to o. Without loss of generality, 
we assume that l^^^^l > l-B^J. Let si and S2 be the mean points of Bj^^ and Bf^^ respectively, and 

\B} I I 

ai = rg-S-r and Q2 = rg-^- Since pi^^ is the mean point of Bi^^, we have /j^q = aisi + a2S2- Thus, 

I I I ^0 ' 

IIP'o ~ '^II ~ llcn-si + CK2S2 ~ o|| = ||ai(2o — si) + 02^2 — 2aio + 2aiSi — o\\ 
= ||qi(2o — si) + Q2S2 — o — 2ai(2o — si — o)|| 

< ||ai(2o — si) + 02^2 — o|| + 2qi||2o — si — o||, (8) 

where the last inequality follows from triangle inequality. 

Note that 2o — si is the mean point of —Bj^, ai(2o — si) + a2S2 is the mean point of —Bj^^ U Bf^, 
and — -B/^, U Bf^^ is a sample from P of size at least Let vr be the mean point of P, and a? = 

we know that with 



l^^pgpllp — 7r||2 (note that the current space is T^). Then by Lemma 
probability 1 — rj, ||ai(2o — si) + Q2S2 — 7r|| < y^e/rja. Similarly, since —Bf^ is a sample from P with 
size at least (by |5/J > \BfJ), we have \\2o — si — 7r|| < y^le/r/a with probability 1 — rj. Since, 
Spjr = X^pgp Hp — o||2 = + llvr — o||2, with a total probability (1 — 7/)(l — rj), we have 

||ai(2o — si) + 0252 — oil + 2ai||2o — si — o|| 
< ||qi(2o — Si) + a2S2 — '''"II + IK ~ o|| + 2q!i(||2o — si — 7r|| + ||7r — o||) 



< ^/I/^a + 2al^/2^a+ {l + 2ai)\\7r 
= (l + 2\/2)A/e7r/a + 3||7r-o|| 



< ^J{1 + 2\/2)2e/?y + 9v/a2 + ||7r - o||2 = ^(1 + 2V2)^e/r] + 96p,^, (9) 

where the first inequality follows from triangle inequality, and the last inequality follows from the 
fact that xiUi + X2y2 < y^2;2 + ^2 y^y^ + y| for any four real numbers xi,X2, 2/1,2/2- 

Setting ?7 = e, by (8) and (9), we have Wpi^ —o\\ < \/l8 + 4:\/25 < 56p^jr, with probability (1 — e)^. 

□ 

8 Proof of Claim 2 in Theorem [2] 

Proof. We prove this claim by induction. For the base case (i.e., I = 1), J^i has the same dimension- 
ality as J^oC^fvo = ^opt (note that f^^ = W^). Hence is a j-dimensional subspace. Then we assume 
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is a (j — w + l)-dimensional subspace for any w < / (i.e., induction hypothesis). Now, we 

consider the case of / + 1. Since J-^+i is only a rotation of Ti n /„j , they have the same dimensionality. 

Also, from the algorithm, we know that Ti n f^, is the subspace in Ti which is perpendicular to 

— > — > — ). _ _ 

Proj{ot.i;^), where Proj{otyj) is the projection of 01^1 on J^. Thus, H is a (j — / + 1 — 1)- 

dimensional subspace, which implies that J^+i is also a (j — /) -dimensional subspace. Hence, Claim 

(2) is proved. □ 



9 Proof of Lemma |6] 



Proof. For simplicity, we let denote dist{J^,J^}. By triangle inequality, for any p G P, we 

have \ \p,T\\ < \\p,T\ \ + 11-^,^^11. Thus, 



Let c 



' ' peP 
= . Then, we have 



2||p,^|| X < c{\\p,T\\' + ^ E ll^^'-^ll')- 



Combining (10) and (11), we have 



peP 



peP 
1 + 2c 



peP 



\P\ 



Y.\\p,H' + \\:F,T\\' = ii + cfr^Y.\\P'^ 



peP 



P&P 



V^EiIp'-^II' + II-^'-^II)'- 



peP 



(10) 



(11) 



Thus, the lemma is true. 



□ 



10 Some Details Analysis for Algorithm Projective-Clustering 

Sample size and success probability: Theorem [2] enables us to find a good flat for the whole 
point set P. To find k good flats, one for each cluster, we need to increase the probability from 
(1 — e)^ to (1 — |)^, and replace 7 by ^. Thus, for each cluster, we need t = ^ In ^ points. By 
Lemma |4j we know that if we want a random sample containing at least t points from one cluster 
with probability 1 — we need to sample r = * points from P, where a is the fraction of the 
cluster in P. Note that since our algorithm only focuses on emulating the behavior of large clusters, 
a > This means that it is sufficient to set the sample size r to be ^ln2A;t. The total success 
probability is therefore ((1 - ^)(1 - f )'^)*^ > 1(1 - e)"^. 

Radius and grid density: Let Opt be the optimal objective value of projective clustering on P. By 
Theorem [2I we know that Step 4 in the algorithm outputs an objective value £ < (1 + 5^/jr)'^^Opt. 
From the proof of Theorem [2j we know that on the path generating the resulting flat, if each node 
incurs a j7-rotation rather than a Z\-rotation, the approximation ratio will become (1 + j?)^-' < 1 + e 
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(instead of + Thus, to reduce 5^/Jr to fj,we can build a grid around the point associated 

with each node v in the tree T generated by Algorithm Recursive-Projection. For each grid point, 
add a node as a new sibling of v (see Step 5) and associate it with the grid point. The problem is 
how to determine the density of the grid so as to generate the desired approximation ratio. 
To determine the density of the grid, we first have 



l<-^<(l + 5v^ry. (12) 

We use the same notations as in Theorem [2j Let vi be the current node in Algorithm Recursive- 
Projection, and Proj{t^j be the projection of on fy^ n Topt- Further, let /i^ = jpjYlp^Pi II < 

p, Proj{ty^) > IP, and h' = \ \o — Proj(t^, )||. By Definition [sj we have 

\K-Projity,)m^^^^^ (13) 



Combining (12) and (13), we have 



- Proj{t^ 



> 1. (14) 



By Lemma j2| we know that j < ^ < 1 (note that t = ^ In as discussed previously). Hence, we 
can set the radius of B to be 5-y/Jr-v/Z- Thus, rg > \\ty^ — Proj{tvj)\\, which implies that Proj{tyi) 
locates inside B. If we set eo = 4j(i+5^rB)JAt ' construct a grid inside B with grid length -^^q, 
then there is one grid point vr satisfying the following inequality. 

Ilvr - Proj{t..,)\\ < JdAsr = eoTB '^^'"^ 



Vd"^ 4j-(l + 5Vjr)iAt 



- 4it^^' ^^^^ 



where the last inequality follows from (12). Combining inequalities (15) and j < ^, we have 
IItt Proj(tvj)\\ ^ ^v^Opi^ which implies that it induces a ^-rotation. 

Running time. The dimension reduction step costs 0(nd-poly(^)) time, and resulting problem has 
diemsion d' =poly(^). Step 3 and 4 take 0{k2'^^^^nd') time. Note that in Step 5, the complexity of 
the grid inside B is (-^)'^'- Thus, the complexity of each 71 will increase to 0((2^^(^^)'^')-'). Hence, 

the total running time is 2^°^^'^^~i^ nd. 



11 PTAS for Regular Projective Clustering 

This section first introduces the regular projective clustering problem, and then presents a PTAS for 
it. We start our discussion with a concept used in statistics. 

Definition 6 (Coefficient of Variation (CV)). Let x he a random variable, and fi = E[x] be its 
expection. The coefficient of variation of x is denoted as ^E[\x-fi\] ~^ ■ 

CV of a single variable aims to measure the dispersion of the variable in a way that does not 
depend on the variable's actual value. The higher the CV, the greater the dispersion is in the variable. 
Distributions with CV < 1 (such as an Erlang distribution) are considered as low-variance, while 
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those with CV > 1 (such as a hyper-exponential distribution) are considered as high- variance. Note 
that many commonly encountered distributions has constant CV (e.g., Gaussian distribution]^. 

Lemma 7. Let X = {xi, ■ ■ ■ be a set of n numbers with coefficient of variation uj, and S = 
{xjj,-- - jXj^} be a random sample of X . Then, for any positve constant rj, Prob{ — > 



Proof Let /i = i EI Li — A^l- Then, from the definition of CV, we know that to — T^-n 



jj, which implies h = ojfi. 

Since the variance of {\xi — /i| | 1 < i < n} is ^ Sr=i(l^« ~ lA ~ P)"^ = — P-"^, and 5 is the 
random sample from X with size m, we know that the expected value and variance of — — — 

,2 -2 

are fl and — respectively. By Markov inequality, we know 



ProKI ^^^'^" ^' - /i| > m ^^—^) < \. (16) 
m \ m r}'^ 



Meanwhile, since 



m \ m m V m 



we have 



ProK ^'^^''"'' ^1 > /i - r? J > probil ^' - A| < V\ ^^-—^). (17) 

m V m m \ m 



Combining (16) and (17), we have 



m \ m rj'^ 

Recall that h = uijl. If we replace p, hy - , the above inequality becomes 



Prob{ ^^-^^ ' ^ > )-) > 1 - ^• 



□ 



Using coefficient of variation, we introduce the regular projective clustering problem. 
Let P be a point set in W^, J'opt be its optimal j-dimensional flat fitting, and o be its mean point. 
It is easy to see that o locates on Fopt- 

Definition 7 (Regular Single j-Flat Fitting). A single j-flat fitting problem with input point 
set P is regular if for any direction if , the coefficient of variation of {< d^, if >\ p & P} is bounded 
by some constant uj. uj is called the regular factor of P. 

Definition 8 (Regular (A;, j)-Projective Clustering). A {k, j) -projective clustering problem with 
input point set P and optimal clusters {Ci, • • • , C^} is regular if each Ci is a regular single j-flat 
fitting problem. 



^ Let f{x) — :;7j=pe ^ be any Gaussian distribution with mean point at the origin. Then, = 
I^Iot^"^ - 2/+°° x^e-f^ dx = J^. Since E[x'] = 5^ we have CV = ^ = . 
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The following theorem is a counterpart of Theorem [T] for Regular projective clustering. 



Theorem 4. Let P = {pi, • • • ,Pri} be the input point set of a regular single j-flat fitting problem 
with mean point a and regular factor oj, and S be its random sample of size m = , where e > 
is a small constant. Let Topt be a j -dimensional flat, and S is the set of points returned by Algorithm 
Symmetric- Sampling on S and a. Then with probability (1 — e)(l — S contains one point s 

such that the flat T' rotated from Topt o-nd induced by satisfies the following inequality, 

peP ^ p&P 



If 




Fig. 8. An example illustrating Theorem [4] 



Proof. Without loss of generality, we assume that o is the origin. Since J^opt passes through o, Topt 
passes through the origin. Thus, we can assume Topt is the j-dimensional subspace spanned by the 
first j dimensions. Let {x\, ■ ■ ■ be the coordinates of each point pi G P, and 6f = ^ X]r=i(^i)^ 
for 1 < / < j. 

We consider the division P = P^U Pr (see Fig. [s]), where Pl = {pi ^ P\x\ < 0} and Pr = {pi £ 
P\x\ > 0}. Let —Pl = {—Pi\Pi £ Pl}- Consider the point set —Pl U Pr. Since P is regular with a 
regular factor lo (which is a positive constant) and the mean of {xj, • • • , x"} is (due to the fact 
that the mean point of P is the origin), the coefficient of variation of {xj, ■ ■ ■ ,xf} is no more than 
u, for all 1 < Z < j. 

For the sample set S, we define a subset T of —S U 5 as T = (—5 H —Pl) U (5 n Pr) = 
-{S n Pl) U (5 n Pr). Since 5 = (5 n Pl) U (5 n Pr), it is easy to see that |r| = |5| = m. Let y be 
the mean point of T with coordinates (yi, • • • , ya). Since Algorithm Symmetric-Sampling enumerates 
the mean points of all subsets of —S U 5, y is clearly in 5. If we denote the projection of y on Topt 
as Proj{y), then it is easy to know that Proj{y) = (yi, • • • , yj, 0, • • • ,0). Let 17" be the unit vector 

Projiy) _ ( yi yj n r.\ 

\\Proj{y)\\ - ^,/„2 4....+„2' './„?■ ■ 

-'opt 



_ yi+-+yj ^jyi~ 

Let 5lpi = ^ Yll=i WVi^^optW^ ■ Below, we prove that y is the desired point which induces a A- 



rotation for Topt with A = Yzr^^crpt- 



we know that in order to prove that y induces a l^^f^opt-rotation, we just need 



By Definition 

IIPro"°/\ll — \-'^/~ ~r ""^^ ^''^ other words, we need to prove two things: (a) 



to show that ^ ^ ^ 



ll-P^oi(y)ll jg larger than certain value, and (b) Pj'oo{y)\\ gj^ialler than certain value. 

^'^T.U\<n,-^>\' ' I ^ S.P, 
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In order to prove (a), we have 



yixi 



i=l 



i=i i=i \lyl + -- - 



Vi^ ^Vj J^i ^ ^ 



(18) 



i=l l=\ 



1=1 

where the inequahty fohows from the fact that {Y^^=i ^i^i)'^ < iJ27=i'^i)i'l27=i^i) ^^^^ 
numbers {oj | 1 < i < n} and {bi \ 1 < i < n} . 



Meanwhile, since | |-Proj(2/)| I = a/?/! + • • • + y?, by (18) we have 



\\Projiy)\\ 



yf + • • • + ?/ ■ 



^j^^ ^.^ > -^-J Combining this with (19), we have 



\\Projiy)\\ 



> 



1 yi 



(19) 



/ |_y2 

Without loss of generality, we assume that 5i = max{(5i, • • • ,5j}. Then we have w — > 

"i 



Since y is the mean point of T with \T\ = m, and T = —{S n Pl) U {S Ci Pr), we have 



(20) 



yi 



m 



m 



Further, since {x\ \ 1 < i < n} has average value zero, and its CV is bounded by w, by Lemma 



for (a). 



we know that Prob{yi > (1 — "Hx ~ — ~)~) — 1 — Thus, with (20), we have the following result 



Proh{ 



\\Pro3{y)\\ 



> 



1 — rj 



'nE^=l\<P^,^>\' 



— )>i-4 



For (b), following the same approach given in the proof of Claim (1), we can easily get a similar 



result as Claim (1' 



. \\y-P'roj{y)\\ 



1 \2 



Oopt 



< 5 with probability (1 ) 



Combining the above results for (a) and (b), and setting the sample size m = and V — 
we have the following inequality, with probability (1 — e)(l — )^, 



55, 



\y - Proj{y)\\ ^ 

\\Proj{y)\\ - iWiJi^n I |2 



This means that y induces a rotation for Topt, where A = YZ^^opt- By Lemma jsj we get the 



desired result, i.e., EpeP I -^'1 1' < (1 + fS)' EpeP 



opt I 



□ 
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By Theorem |4] and a similar idea with Theorem [2| we obtain the fohowing theorem for regular 
single j-flat fltting problem. 

Theorem 5. Let P he the point set of a regular j-flat fitting problem in with regular factor 
bj. Then if run Algorithm Recursive-Projection on P with sample size m = jq^j with probability 

(1 — e)(l — i^2_i the output T contains a root-to-leaf path such that the j points associated with 
the path determine a fiat T satisfying inequality 

Proof. Without loss of generality, we assume that o is the origin of M"'. Since all related flats in 
Algorithm Recursive-Projection pass through o, we can view every flat as a subspace in . 

From the algorithm, we know that for any node v at level / of 7", 1 < / < j, there is a corresponding 
implicit point set P„, which is the projection of P on /„. There is also an implicit flat f^ n Topt in 
fv By Theorem |4| we know that there is one child of f, denoted as f', such that the rotation of 
fv n Topt induced by ot^i forms a Z\-rotation with respect to Py, where A = f^^- Thus, if we 
always select such children (i.e., satisfying the above condition) from root to leaf, we get a path with 
nodes {vq, vi, - ■ ■ , Vj}, where vq is the root and vi+i is the child of vi. Correspondingly, a sequence of 
implicit point sets {Pq, Pi, ■ ■ ■ , Pj} and a sequence of flats {Tq, -^i, • • • , Tj} also be obtained, which 
have the following properties. 

1. Initially, = J^ont, and Pq = P. 

2. For any I < I < j, Ti is the Z\-rotation of Ji-i n fvi^i induced by oe„, , and Pi is the projection 
of Pi-i on /^;_^ (see Figure m) 



Note that since both and J^i-i n fvi_i locate on fvi^^, they are all perpendicular to olt,,_-^. 
The following claim reveals the dimensionality of each J7. 

Claim (5). For 1 < / < j, -F; is a (j — / + l)-dimensional subspace. 

Proof. We prove the claim by induction. For the base case (i.e., I = 1), Ti has the same dimensionality 

as Fq n = Fopt (note = W^). Hence, Fi is a j-dimensional subspace. Then we assume that 

Fu, is a ( j — u) + l)-dimensional subspace for w < I. Now we consider the case of / + 1. Since 

is a rotation of Fi D fy, , they have the same dimensionality. Also, from the algorithm, we know that 

~ — > — >■ 
FiHfvi is the subspace in Fi which is perpendicular to Proj{otyi), where Proj{otvi) the projection 

of otyi on Fi. Thus, Fi Ci fvi is a ( j — / + 1 — l)-dimensional subspace, which implies that Fi+i is also 

a (j — /)-dimensional subspace. Hence, Claim (5) is proved. □ 

By Lemma [3j we have the following claim. 

Claim (6). For any 1 < / < j, Ep^p, \\P,H? < (1 + f^)' EpeP, \\p,^i-i n 

We construct another sequence of flats {J^i, • • • ,Fj} as follows. 

1. Initially, Fi = Fi. 

2. For any 2 < I < j, Fi = span{Fi, ot^^, • • • , otv^_^}. 



From the above construction for Fi, we have the following claim. 

Claim (7). For any point p G P;, let p be the corresponding point when p is mapped back to 
Then, \\p,Fi\\ = \\p,Fi\\. 



pd 
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From Claim (5), we know that each Ti is a j-dimensional subspace. From the algorithm, we 
know that span{Fi-i n fvi_^ , } = ^l-i (see Fig. [t]), which implies span{J^i^i D fvi_i , '(^vi , 

otvi_^} = Then by Claim (6) and (7), we have the following inequality. 

Eii^'"^'ii'^(i + ^)'Eii^''-^'-iii'- (21) 

peP peP 



Meanwhile, we have X^pgp IIPi-^i 



Recursively using inequality (21), we have XlpeP < (1 + fz^)^'"'' Y^pepWP^ J^iW"^ ■ 



'<{l + l^fEpep\\P^^opt\\'. Thus, 



peP ^ peP 



2 



By Claim (5), we know that is a 1-dimension flat (i.e., the single line spanned by ot^.). Hence, 



J^j = span{oty^^ • • • , oty.}. Thus if setting IF = Tj, we have 

Y.\M"^^^ + ^f'Y.\\p^^opt\?- 
p&p p&p 



the success probability is (1 — ^)(1 



Success probability: Each time we use Theorem 

(we replace e by | tp increase the sample size from '^^^ to ) . Thus, the total success probability 

With the above theorem, we can easily have the following theorem (using the approach similar 
to Theorem [s]). 

Theorem 6. Let P be the point set of a regular (k, j)-projective clustering problem in M'^ with regular 
factor CO. If each optimal cluster has at least a\P\ points from P for some constant < a < ^, 
Algorithm Projective- Clustering yields a PTAS with constant probability, where the running time of 
the PTAS is 0{2P°^y^'^^nd). 

12 Extension to Lt Sense Projective Clustering 

We first introduce the Lr sense Z\-rotation. 

Definition 9 {Lt Sense Z\-Rotation). Let P be a points set, T be a j-dimensional flat, andu — o 
be a vector in with o € -F. Further, let /i"^ = SpeP I < P ~ o, ||Proj(M)-o|| ^ 1^' ^^'^ 
rotation of J- induced by u with angle 6, where Proj{u) is the projection of u on T. Then, T' is a 
A-rotation of F (with respect to P) if < arctan^. 

The following lemma shows t how the value of X^peP 1 1^*' -^1 T changes after a L^- sense A- 
rotation. 

Lemma 8. Let P be a point set, T be a j-dimensional flat, and u be a point in M'^. If J-' is a L-j- 
sense A-rotation (with respect to P) of J- induced by the vector u — a for some point o ^ T , then for 
any integer 1 < r < oo, 

' ' peP ' ' peP 
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Before proving Lemma [8j we first introduce the following lemma. 

Lemma 9. For any integer r > 1, and positive numbers x, y, a, {x + ayY < (1 + aY~^x'^ + a(l + 
ay-'^y\ 

Proof. We prove this lemma by mathematical induction on r. 

Base case: For r = 1, it is easy to see that (x + ayY = (1 + a^x^ + a(l + a^y^ = (1 + aY'^x'^ + 
a(l + ay^^y'^ . Thus, base case holds. 

Induction step: Assume that the inequality holds for r < tq for some tq > 1 (i.e., Induction 
hypothesis). Now consider the case of r = tq + 1. By the induction hypothesis, we have 

(x + ai/)""o+^ <{x + ay)((l + a)^«-^x^'' + a(l + ay'>-^y^°) 
= (1 + a)^o-^x""o+^ + a(l + a)""o-^(x""«y + + a^(l + a)"'o~^y""o+\ (22) 

Since both x and y are positive, we have (x — y){x'^° — y"^") > 0. Also, it is easy to know that 

(x - y)(x"'o - y^o) > ^ x^«+^ + y^"+i > x^"y + xy""°. 

Thus, if replacing x'^^y + xy"^" by x'^""''"'^ + y'^^+i jj^ (22), we have 

(x + ay)^«+^ < (1 + a)^"-^x^«+^ + a(l + q)^»~1(x^°+i + y^«+i) + a2(l + a)^«-^y^«+^ 

= (1 + q)^"x^o+i + a(l + a)^«y^"+^ 
Hence, the inequality holds for r = tq + 1. □ 

Now we prove Lemma [8j 

Proof (of Lemma^. We use the same notations as in Definition [oj For any p ^ P, let Proj{p) be 
its projection on and Up = \ < p — a, ||p^°^'|"|l°|| > |. Then we have 

< \\p- Proj{p)\\ + \\Proj{p),T'\\ = \\p,T\\ + \\Proj{p),T'\\. 
Since the rotation angle from to J^' is < arctan^, we have \\Proj{p),J^'\\ = Up sin 6 < 



,tan6 < -0rUp. Let 5 = ijjr\Y^p^p\\p,^\\'^y^'^ ■ Then, we have 



ip,^'ir < {\\p,T\\+upsmer < {\\p,T\\ + jlupY. 



Using Lemmajojwith x = y = |np, and a = we have 



\p,r\r < (1 + jy~'\\p,H^ + j{i + jy^l^pY- (23) 



Summing both sides of ( 23 ) over p, we have 

peP peP peP 



peP peP 
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Since h"^ = jp-^ "^^pepi'^pY ^'^ = jp-^ SpeP ll^'j-^ir' above inequality becomes 

^ ^ wp^^t < (1 + f r^^ E ii^'-^ir + yd + f r^^ E ii^-^ir 

' ' peP peP peP 

' ' peP 

Thus the lemma is true. □ 

Using Lemma |8] and a similar approach for the L2 case, we have the following theorem. Since the 
idea and proofs are almost the same, we omit them from the paper. 

Theorem 7. Let P be the point set of an Lj- sense {k, j)-projective clustering problem in M'^ for inte- 
ger 1 < r < 00. Let Opt be the optimal objective value. With constant probability and in 0{2^°''^^^^nd) 
time, Algorithm Projective- Clustering outputs an approximation solution {J-i, • • • , J-k} such that each 
Fi is a j-fiat, and X^pep' ™iiii<;<A,. JiH"^ < (1 + e)Opt, where P' is a subset of P with at least 
(1 — 7)|P| points. 

Furthermore, we also have a similar result for Lr sense regular projective clustering. For the same 
reason, we omit the details for this case. 



Definition 10 ( Lr Sense Coefficient of Variation (CV) ). Let x be a random variable, and 

E[\x~^,\\ 



fj, = E[x]. The coefficient of variation of x is denoted as (-^[1^ ^1 



Lemma 10. Let X = {xi, • • • , a;„} be a set of n numbers with coefficient of variation lo, and S = 
{xji, • • • be a random sample of X. Also, let ^ = ^ Yl^=i^i '^"-^ ~ n X]r=i(^« ~ l^Y ■ Then, 

for any positive constant rj and integer 1 < r < 00, 



„ , \Xi, — Ll\ , /cl!^ — 1 

Pro6 ^'-^' " ^ > 1 - r?W - > 1 - ^. 

m V m w T]^ 

Theorem 8. Let P be the point set of an Lr sense regular (k, j) -projective clustering problem in M"^ 
with regular factor oj, where 1 < t < 00 is an integer. Lf each optimal cluster has size at least a\P\. 
Then Algorithm Projective- Clustering yields a PTAS with constant probability, where the running 
time of the PTAS is 0{2P°^y^^'^nd). 
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